<<<<<<< HEAD ======= >>>>>>> 46aa26ca3b1bc5d849e1bff1fdac9724cac36c34 Project3B
<<<<<<< HEAD

Ian’s EDA Graphs

<<<<<<< HEAD

======= <<<<<<< HEAD

=======

>>>>>>> a924709f8540846ff58f1bfba2f17655d1ceab03 >>>>>>> 46aa26ca3b1bc5d849e1bff1fdac9724cac36c34

There was one observation in the data that had a total coffee point score of 0. This is an extreme outlier, and also doesn’t really make sense logistically. It would be hard for any cup of coffee to truly score a flat 0 without there being some sort of bias in the rating. With a score that low, it could affect the model later on, so removing it from the data would be a good decision.

<<<<<<< HEAD

======= <<<<<<< HEAD

ggplot(data = coffee.new, aes(x = country_group, y = Total.Cup.Points, fill = country_group)) + geom_boxplot() + labs(x = "Continent", y = "Total Coffee Points", title = "Total Coffee Points Based on Continent")

=======

ggplot(data = coffee.new, aes(x = country_group, y = Total.Cup.Points, fill = country_group)) + geom_boxplot() + labs(x = "Continent", y = "Total Coffee Points", title = "Total Coffee Points Based on Continent")

>>>>>>> a924709f8540846ff58f1bfba2f17655d1ceab03 >>>>>>> 46aa26ca3b1bc5d849e1bff1fdac9724cac36c34

These plots look extremely similar, but the Robusta species has a lower mean than Arabica. Since the shape of the density plots are so similar in shape, it seems that Robusta as a species is very close in consistency to Arabica. Because the mean is lower though, there may be some kind of genetic issue with the bean that maybe doesn’t bring out as much flavor or something like that. Overall though, the species are very comparable to one another.

<<<<<<< HEAD

## Warning: Removed 227 rows containing missing values (`geom_point()`).

MODEL SELECTION

mlr <- lm(Total.Cup.Points ~ ., data=coffee.new)
summary(mlr)
=======

We filtered out the 0 total cup points since they are extreme low outliers.

<<<<<<< HEAD

## Warning: Removed 227 rows containing missing values (`geom_point()`).

=======

## Warning: Removed 227 rows containing missing values (`geom_point()`).

>>>>>>> a924709f8540846ff58f1bfba2f17655d1ceab03

MODEL SELECTION

linear.model.all <- lm(Total.Cup.Points ~ ., data=coffee.new)
summary(linear.model.all)
>>>>>>> 46aa26ca3b1bc5d849e1bff1fdac9724cac36c34
## 
## Call:
## lm(formula = Total.Cup.Points ~ ., data = coffee.new)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0262128 -0.0042212 -0.0008627  0.0064472  0.0201168 
## 
## Coefficients:
##                                           Estimate Std. Error t value Pr(>|t|)
## (Intercept)                             -9.014e-02  4.830e-02  -1.866   0.0647
## Number.of.Bags                           6.042e-06  8.560e-06   0.706   0.4818
## Processing.MethodOther                   5.459e-03  5.094e-03   1.072   0.2863
## Processing.MethodPulped natural / honey -9.441e-05  4.718e-03  -0.020   0.9841
## Processing.MethodWashed / Wet           -1.313e-04  2.262e-03  -0.058   0.9538
## Aroma                                    1.008e+00  7.217e-03 139.638   <2e-16
## Flavor                                   9.946e-01  8.736e-03 113.848   <2e-16
## Aftertaste                               9.978e-01  7.171e-03 139.151   <2e-16
## Acidity                                  1.006e+00  6.189e-03 162.590   <2e-16
## Body                                     1.010e+00  6.377e-03 158.357   <2e-16
## Balance                                  9.919e-01  8.000e-03 123.977   <2e-16
## Uniformity                               9.963e-01  2.712e-03 367.339   <2e-16
## Clean.Cup                                1.008e+00  5.360e-03 188.001   <2e-16
## Sweetness                                9.986e-01  4.711e-03 211.947   <2e-16
## Cupper.Points                            1.000e+00  2.502e-03 399.725   <2e-16
## Moisture                                -2.283e-02  2.319e-02  -0.985   0.3270
## Category.One.Defects                     1.181e-04  3.224e-04   0.366   0.7149
## Quakers                                 -8.601e-05  6.431e-04  -0.134   0.8939
## ColorBluish-Green                        5.468e-04  4.079e-03   0.134   0.8936
## ColorGreen                               1.820e-03  3.327e-03   0.547   0.5855
## Category.Two.Defects                    -2.665e-04  3.806e-04  -0.700   0.4852
## country_groupAsia                       -2.249e-03  5.544e-03  -0.406   0.6858
## country_groupSouth America               2.658e-03  4.239e-03   0.627   0.5320
##                                            
## (Intercept)                             .  
## Number.of.Bags                             
## Processing.MethodOther                     
## Processing.MethodPulped natural / honey    
## Processing.MethodWashed / Wet              
## Aroma                                   ***
## Flavor                                  ***
## Aftertaste                              ***
## Acidity                                 ***
## Body                                    ***
## Balance                                 ***
## Uniformity                              ***
## Clean.Cup                               ***
## Sweetness                               ***
## Cupper.Points                           ***
## Moisture                                   
## Category.One.Defects                       
## Quakers                                    
## ColorBluish-Green                          
## ColorGreen                                 
## Category.Two.Defects                       
## country_groupAsia                          
## country_groupSouth America                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.008698 on 107 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 4.029e+05 on 22 and 107 DF,  p-value: < 2.2e-16
coffeeSub <- regsubsets(`Total.Cup.Points` ~ Category.Two.Defects + Category.One.Defects + 
    Moisture + Quakers + altitude_mean_meters + Number.of.Bags, data = coffee.clean, nbest=2)
plot(coffeeSub)

model3 <- lm(Total.Cup.Points~ Category.Two.Defects + Moisture, data = coffee.new)

summary(model3)

To perform model selection, a subset selection of variables can be created to help choose variables to put into a linear model. After running the subset selection, the best model that can be created is a linear model with category two defects and moisture as the sole explanatory variables. Even using the best model possible for a linear model, the adjusted R squared is still extremely low. Because of this, a linear model shouldn’t be used, and a different model should be found. A gamma might be better in this scenario because our data is continuous and positive.

naCoffee = coffee.new %>% drop_na()
<<<<<<< HEAD
stepwise <- lm(Total.Cup.Points ~ ., naCoffee)
model_b <- step(stepwise, direction='backward')
## Start:  AIC=-1212.92
## Total.Cup.Points ~ Number.of.Bags + Processing.Method + Aroma + 
##     Flavor + Aftertaste + Acidity + Body + Balance + Uniformity + 
##     Clean.Cup + Sweetness + Cupper.Points + Moisture + Category.One.Defects + 
##     Quakers + Color + Category.Two.Defects + country_group
## 
##                        Df Sum of Sq     RSS      AIC
## - Processing.Method     3    0.0001  0.0082 -1217.43
## - Color                 2    0.0000  0.0081 -1216.33
## - Quakers               1    0.0000  0.0081 -1214.90
## - Category.One.Defects  1    0.0000  0.0081 -1214.76
## - Category.Two.Defects  1    0.0000  0.0081 -1214.32
## - Number.of.Bags        1    0.0000  0.0081 -1214.31
## - country_group         2    0.0002  0.0083 -1214.15
## - Moisture              1    0.0001  0.0082 -1213.75
## <none>                               0.0081 -1212.92
## - Flavor                1    0.9806  0.9887  -590.25
## - Balance               1    1.1629  1.1710  -568.26
## - Aftertaste            1    1.4650  1.4731  -538.42
## - Aroma                 1    1.4753  1.4834  -537.52
## - Body                  1    1.8973  1.9054  -504.97
## - Acidity               1    2.0001  2.0082  -498.14
## - Clean.Cup             1    2.6741  2.6822  -460.52
## - Sweetness             1    3.3987  3.4068  -429.43
## - Uniformity            1   10.2092 10.2173  -286.65
## - Cupper.Points         1   12.0887 12.0968  -264.70
## 
## Step:  AIC=-1217.43
## Total.Cup.Points ~ Number.of.Bags + Aroma + Flavor + Aftertaste + 
##     Acidity + Body + Balance + Uniformity + Clean.Cup + Sweetness + 
##     Cupper.Points + Moisture + Category.One.Defects + Quakers + 
##     Color + Category.Two.Defects + country_group
## 
##                        Df Sum of Sq     RSS      AIC
## - Color                 2    0.0000  0.0082 -1220.78
## - Quakers               1    0.0000  0.0082 -1219.41
## - Category.One.Defects  1    0.0000  0.0082 -1219.32
## - country_group         2    0.0001  0.0083 -1219.25
## - Category.Two.Defects  1    0.0000  0.0082 -1218.75
## - Moisture              1    0.0001  0.0083 -1218.22
## - Number.of.Bags        1    0.0001  0.0083 -1218.09
## <none>                               0.0082 -1217.43
## - Flavor                1    1.0046  1.0128  -593.12
## - Balance               1    1.2035  1.2116  -569.82
## - Aroma                 1    1.5005  1.5087  -541.31
## - Aftertaste            1    1.6077  1.6159  -532.39
## - Body                  1    1.9177  1.9259  -509.58
## - Acidity               1    2.0917  2.0999  -498.33
## - Clean.Cup             1    2.7565  2.7647  -462.58
## - Sweetness             1    3.5051  3.5133  -431.43
## - Uniformity            1   10.7089 10.7171  -286.44
## - Cupper.Points         1   13.3664 13.3746  -257.64
## 
## Step:  AIC=-1220.78
## Total.Cup.Points ~ Number.of.Bags + Aroma + Flavor + Aftertaste + 
##     Acidity + Body + Balance + Uniformity + Clean.Cup + Sweetness + 
##     Cupper.Points + Moisture + Category.One.Defects + Quakers + 
##     Category.Two.Defects + country_group
## 
##                        Df Sum of Sq     RSS      AIC
## - Quakers               1    0.0000  0.0082 -1222.75
## - Category.One.Defects  1    0.0000  0.0082 -1222.66
## - country_group         2    0.0002  0.0084 -1222.33
## - Category.Two.Defects  1    0.0000  0.0083 -1222.20
## - Number.of.Bags        1    0.0001  0.0083 -1221.70
## - Moisture              1    0.0001  0.0083 -1221.47
## <none>                               0.0082 -1220.78
## - Flavor                1    1.0113  1.0195  -596.27
## - Balance               1    1.2190  1.2273  -572.16
## - Aroma                 1    1.5313  1.5396  -542.68
## - Aftertaste            1    1.6261  1.6343  -534.92
## - Body                  1    1.9695  1.9777  -510.13
## - Acidity               1    2.2092  2.2174  -495.25
## - Clean.Cup             1    2.7798  2.7880  -465.49
## - Sweetness             1    3.5328  3.5410  -434.41
## - Uniformity            1   10.8835 10.8917  -288.34
## - Cupper.Points         1   13.6947 13.7029  -258.49
## 
## Step:  AIC=-1222.75
## Total.Cup.Points ~ Number.of.Bags + Aroma + Flavor + Aftertaste + 
##     Acidity + Body + Balance + Uniformity + Clean.Cup + Sweetness + 
##     Cupper.Points + Moisture + Category.One.Defects + Category.Two.Defects + 
##     country_group
## 
##                        Df Sum of Sq     RSS      AIC
## - Category.One.Defects  1    0.0000  0.0082 -1224.63
## - country_group         2    0.0002  0.0084 -1224.29
## - Category.Two.Defects  1    0.0000  0.0083 -1224.11
## - Number.of.Bags        1    0.0001  0.0083 -1223.65
## - Moisture              1    0.0001  0.0083 -1223.30
## <none>                               0.0082 -1222.75
## - Flavor                1    1.0431  1.0513  -594.28
## - Balance               1    1.2191  1.2273  -574.15
## - Aroma                 1    1.5322  1.5404  -544.61
## - Aftertaste            1    1.6507  1.6589  -534.98
## - Body                  1    1.9806  1.9888  -511.40
## - Acidity               1    2.2565  2.2647  -494.51
## - Clean.Cup             1    2.7811  2.7893  -467.43
## - Sweetness             1    3.5328  3.5410  -436.41
## - Uniformity            1   10.9769 10.9851  -289.23
## - Cupper.Points         1   13.7402 13.7485  -260.06
## 
## Step:  AIC=-1224.63
## Total.Cup.Points ~ Number.of.Bags + Aroma + Flavor + Aftertaste + 
##     Acidity + Body + Balance + Uniformity + Clean.Cup + Sweetness + 
##     Cupper.Points + Moisture + Category.Two.Defects + country_group
## 
##                        Df Sum of Sq     RSS      AIC
## - country_group         2    0.0002  0.0084 -1226.29
## - Category.Two.Defects  1    0.0000  0.0083 -1225.99
## - Number.of.Bags        1    0.0001  0.0083 -1225.53
## - Moisture              1    0.0001  0.0083 -1225.15
## <none>                               0.0082 -1224.63
## - Flavor                1    1.1561  1.1643  -583.00
## - Balance               1    1.2361  1.2444  -574.36
## - Aftertaste            1    1.7299  1.7381  -530.92
## - Aroma                 1    1.7455  1.7538  -529.75
## - Body                  1    1.9878  1.9960  -512.93
## - Acidity               1    2.2812  2.2895  -495.10
## - Clean.Cup             1    2.7894  2.7976  -469.04
## - Sweetness             1    3.5423  3.5505  -438.06
## - Uniformity            1   10.9910 10.9992  -291.06
## - Cupper.Points         1   13.8615 13.8697  -260.92
## 
## Step:  AIC=-1226.29
## Total.Cup.Points ~ Number.of.Bags + Aroma + Flavor + Aftertaste + 
##     Acidity + Body + Balance + Uniformity + Clean.Cup + Sweetness + 
##     Cupper.Points + Moisture + Category.Two.Defects
## 
##                        Df Sum of Sq     RSS      AIC
## - Category.Two.Defects  1    0.0000  0.0084 -1227.79
## - Moisture              1    0.0001  0.0085 -1227.08
## <none>                               0.0084 -1226.29
## - Number.of.Bags        1    0.0002  0.0086 -1225.59
## - Flavor                1    1.1588  1.1672  -586.68
## - Balance               1    1.2377  1.2461  -578.17
## - Aroma                 1    1.7979  1.8062  -529.92
## - Aftertaste            1    1.8243  1.8327  -528.03
## - Body                  1    2.0446  2.0530  -513.27
## - Acidity               1    2.3246  2.3330  -496.65
## - Clean.Cup             1    2.8047  2.8131  -472.32
## - Sweetness             1    3.5422  3.5506  -442.05
## - Uniformity            1   11.4857 11.4941  -289.34
## - Cupper.Points         1   14.8980 14.9064  -255.55
## 
## Step:  AIC=-1227.79
## Total.Cup.Points ~ Number.of.Bags + Aroma + Flavor + Aftertaste + 
##     Acidity + Body + Balance + Uniformity + Clean.Cup + Sweetness + 
##     Cupper.Points + Moisture
## 
##                  Df Sum of Sq     RSS      AIC
## - Moisture        1    0.0001  0.0085 -1228.60
## <none>                         0.0084 -1227.79
## - Number.of.Bags  1    0.0001  0.0086 -1227.57
## - Flavor          1    1.1630  1.1714  -588.22
## - Balance         1    1.2441  1.2525  -579.51
## - Aroma           1    1.7979  1.8063  -531.91
## - Aftertaste      1    1.8244  1.8328  -530.02
## - Body            1    2.0447  2.0531  -515.26
## - Acidity         1    2.3279  2.3363  -498.47
## - Clean.Cup       1    2.8294  2.8378  -473.19
## - Sweetness       1    3.5469  3.5553  -443.88
## - Uniformity      1   11.4910 11.4994  -291.28
## - Cupper.Points   1   14.9399 14.9483  -257.18
## 
## Step:  AIC=-1228.6
## Total.Cup.Points ~ Number.of.Bags + Aroma + Flavor + Aftertaste + 
##     Acidity + Body + Balance + Uniformity + Clean.Cup + Sweetness + 
##     Cupper.Points
## 
##                  Df Sum of Sq     RSS      AIC
## <none>                         0.0085 -1228.60
## - Number.of.Bags  1    0.0001  0.0086 -1228.35
## - Flavor          1    1.2782  1.2867  -578.01
## - Balance         1    1.2803  1.2888  -577.80
## - Aroma           1    1.7981  1.8066  -533.89
## - Aftertaste      1    1.8805  1.8890  -528.09
## - Body            1    2.0565  2.0650  -516.51
## - Acidity         1    2.3286  2.3371  -500.42
## - Clean.Cup       1    2.8371  2.8456  -474.83
## - Sweetness       1    3.5751  3.5836  -444.85
## - Uniformity      1   11.6462 11.6547  -291.54
## - Cupper.Points   1   15.3202 15.3287  -255.91
=======
stepwise <- lm(Total.Cup.Points ~ . - Flavor - Cupper.Points - 
    Aroma - Aftertaste - Body - Acidity - Balance - Clean.Cup - 
    Sweetness - Uniformity, data= coffee.new)
model_b <- step(stepwise, direction='backward')
## Start:  AIC=203.26
## Total.Cup.Points ~ (Number.of.Bags + Processing.Method + Aroma + 
##     Flavor + Aftertaste + Acidity + Body + Balance + Uniformity + 
##     Clean.Cup + Sweetness + Cupper.Points + Moisture + Category.One.Defects + 
##     Quakers + Color + Category.Two.Defects + country_group) - 
##     Flavor - Cupper.Points - Aroma - Aftertaste - Body - Acidity - 
##     Balance - Clean.Cup - Sweetness - Uniformity
## 
##                        Df Sum of Sq    RSS    AIC
## - Quakers               1     0.007 508.31 201.26
## - Number.of.Bags        1     0.131 508.43 201.29
## - Moisture              1     1.201 509.50 201.57
## - Category.One.Defects  1     1.772 510.07 201.71
## - Category.Two.Defects  1     2.049 510.35 201.78
## <none>                              508.30 203.26
## - country_group         2    17.336 525.64 203.62
## - Color                 2    18.831 527.13 203.99
## - Processing.Method     3   112.701 621.00 223.29
## 
## Step:  AIC=201.26
## Total.Cup.Points ~ Number.of.Bags + Processing.Method + Moisture + 
##     Category.One.Defects + Color + Category.Two.Defects + country_group
## 
##                        Df Sum of Sq    RSS    AIC
## - Number.of.Bags        1     0.139 508.45 199.30
## - Moisture              1     1.272 509.58 199.59
## - Category.One.Defects  1     1.767 510.08 199.71
## - Category.Two.Defects  1     2.195 510.51 199.82
## <none>                              508.31 201.26
## - country_group         2    17.416 525.73 201.64
## - Color                 2    18.869 527.18 202.00
## - Processing.Method     3   112.780 621.09 221.31
## 
## Step:  AIC=199.3
## Total.Cup.Points ~ Processing.Method + Moisture + Category.One.Defects + 
##     Color + Category.Two.Defects + country_group
## 
##                        Df Sum of Sq    RSS    AIC
## - Moisture              1     1.298 509.75 197.63
## - Category.One.Defects  1     1.809 510.26 197.76
## - Category.Two.Defects  1     2.068 510.52 197.82
## <none>                              508.45 199.30
## - country_group         2    18.516 526.96 199.95
## - Color                 2    20.432 528.88 200.42
## - Processing.Method     3   116.229 624.68 220.06
## 
## Step:  AIC=197.63
## Total.Cup.Points ~ Processing.Method + Category.One.Defects + 
##     Color + Category.Two.Defects + country_group
## 
##                        Df Sum of Sq    RSS    AIC
## - Category.Two.Defects  1     1.905 511.65 196.11
## - Category.One.Defects  1     1.984 511.73 196.13
## <none>                              509.75 197.63
## - country_group         2    19.318 529.06 198.47
## - Color                 2    21.650 531.40 199.04
## - Processing.Method     3   118.554 628.30 218.81
## 
## Step:  AIC=196.11
## Total.Cup.Points ~ Processing.Method + Category.One.Defects + 
##     Color + country_group
## 
##                        Df Sum of Sq    RSS    AIC
## - Category.One.Defects  1     1.815 513.47 194.57
## <none>                              511.65 196.11
## - country_group         2    21.955 533.61 197.58
## - Color                 2    22.767 534.42 197.77
## - Processing.Method     3   121.884 633.53 217.89
## 
## Step:  AIC=194.57
## Total.Cup.Points ~ Processing.Method + Color + country_group
## 
##                     Df Sum of Sq    RSS    AIC
## <none>                           513.47 194.57
## - country_group      2    20.347 533.81 195.63
## - Color              2    23.698 537.16 196.44
## - Processing.Method  3   120.838 634.30 216.05

We chose the “best” linear model based on their AIC, which is 194.57

best.linear.model <- lm(Total.Cup.Points ~ Processing.Method + Color + country_group, data = coffee.new)
summary(best.linear.model)
## 
## Call:
## lm(formula = Total.Cup.Points ~ Processing.Method + Color + country_group, 
##     data = coffee.new)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.0973  -0.5239   0.2059   1.0491   4.5221 
## 
## Coefficients:
##                                         Estimate Std. Error t value Pr(>|t|)
## (Intercept)                              85.2267     1.2460  68.401  < 2e-16
## Processing.MethodOther                   -5.1200     1.0011  -5.114 1.18e-06
## Processing.MethodPulped natural / honey   0.7255     1.0336   0.702   0.4840
## Processing.MethodWashed / Wet            -0.1764     0.4332  -0.407   0.6846
## ColorBluish-Green                        -1.4676     0.8864  -1.656   0.1003
## ColorGreen                               -1.6676     0.7042  -2.368   0.0194
## country_groupAsia                        -0.2088     1.0837  -0.193   0.8475
## country_groupSouth America               -1.2618     0.9431  -1.338   0.1834
##                                            
## (Intercept)                             ***
## Processing.MethodOther                  ***
## Processing.MethodPulped natural / honey    
## Processing.MethodWashed / Wet              
## ColorBluish-Green                          
## ColorGreen                              *  
## country_groupAsia                          
## country_groupSouth America                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.052 on 122 degrees of freedom
## Multiple R-squared:  0.2343, Adjusted R-squared:  0.1903 
## F-statistic: 5.332 on 7 and 122 DF,  p-value: 2.418e-05
>>>>>>> 46aa26ca3b1bc5d849e1bff1fdac9724cac36c34

Gamma

Gamma identity

gamma.identity <- glm(Total.Cup.Points ~ . - Flavor - Cupper.Points - Aroma - Aftertaste - Body - Acidity - Balance - Clean.Cup - Sweetness - Uniformity, family = Gamma(link = "identity"),  data = coffee.new)
summary(gamma.identity)
## 
## Call:
## glm(formula = Total.Cup.Points ~ . - Flavor - Cupper.Points - 
##     Aroma - Aftertaste - Body - Acidity - Balance - Clean.Cup - 
##     Sweetness - Uniformity, family = Gamma(link = "identity"), 
##     data = coffee.new)
## 
## Coefficients:
##                                          Estimate Std. Error t value Pr(>|t|)
## (Intercept)                             85.690852   1.606864  53.328  < 2e-16
## Number.of.Bags                           0.000128   0.001978   0.065    0.949
## Processing.MethodOther                  -5.235370   1.019400  -5.136 1.13e-06
## Processing.MethodPulped natural / honey  0.512817   1.143289   0.449    0.655
## Processing.MethodWashed / Wet           -0.181949   0.526279  -0.346    0.730
## Moisture                                -2.906229   5.167422  -0.562    0.575
## Category.One.Defects                    -0.048118   0.070833  -0.679    0.498
## Quakers                                  0.006219   0.152486   0.041    0.968
## ColorBluish-Green                       -1.325588   0.959922  -1.381    0.170
## ColorGreen                              -1.586753   0.771630  -2.056    0.042
## Category.Two.Defects                    -0.063328   0.091138  -0.695    0.489
## country_groupAsia                       -0.287921   1.305546  -0.221    0.826
## country_groupSouth America              -1.367713   1.013634  -1.349    0.180
##                                            
## (Intercept)                             ***
## Number.of.Bags                             
## Processing.MethodOther                  ***
## Processing.MethodPulped natural / honey    
## Processing.MethodWashed / Wet              
## Moisture                                   
## Category.One.Defects                       
## Quakers                                    
## ColorBluish-Green                          
## ColorGreen                              *  
## Category.Two.Defects                       
## country_groupAsia                          
## country_groupSouth America                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Gamma family taken to be 0.0006822081)
## 
##     Null deviance: 0.109580  on 129  degrees of freedom
## Residual deviance: 0.084402  on 117  degrees of freedom
## AIC: 589.49
## 
## Number of Fisher Scoring iterations: 5
gamma.identity.backward <- step(gamma.identity, direction = 'backward')
## Start:  AIC=589.49
## Total.Cup.Points ~ (Number.of.Bags + Processing.Method + Aroma + 
##     Flavor + Aftertaste + Acidity + Body + Balance + Uniformity + 
##     Clean.Cup + Sweetness + Cupper.Points + Moisture + Category.One.Defects + 
##     Quakers + Color + Category.Two.Defects + country_group) - 
##     Flavor - Cupper.Points - Aroma - Aftertaste - Body - Acidity - 
##     Balance - Clean.Cup - Sweetness - Uniformity
## 
##                        Df Deviance    AIC
## - Quakers               1 0.084403 587.49
## - Number.of.Bags        1 0.084405 587.49
## - Moisture              1 0.084617 587.80
## - Category.One.Defects  1 0.084716 587.95
## - Category.Two.Defects  1 0.084730 587.97
## - country_group         2 0.087080 589.42
## <none>                    0.084402 589.49
## - Color                 2 0.087373 589.84
## - Processing.Method     3 0.102273 609.68
## 
## Step:  AIC=587.49
## Total.Cup.Points ~ Number.of.Bags + Processing.Method + Moisture + 
##     Category.One.Defects + Color + Category.Two.Defects + country_group
## 
##                        Df Deviance    AIC
## - Number.of.Bags        1 0.084406 585.50
## - Moisture              1 0.084619 585.81
## - Category.One.Defects  1 0.084719 585.96
## - Category.Two.Defects  1 0.084739 585.99
## - country_group         2 0.087083 587.45
## <none>                    0.084403 587.49
## - Color                 2 0.087373 587.88
## - Processing.Method     3 0.102288 607.93
## 
## Step:  AIC=585.5
## Total.Cup.Points ~ Processing.Method + Moisture + Category.One.Defects + 
##     Color + Category.Two.Defects + country_group
## 
##                        Df Deviance    AIC
## - Moisture              1 0.084623 583.82
## - Category.One.Defects  1 0.084724 583.97
## - Category.Two.Defects  1 0.084758 584.02
## <none>                    0.084406 585.50
## - country_group         2 0.087372 585.92
## - Color                 2 0.087536 586.16
## - Processing.Method     3 0.102860 607.00
## 
## Step:  AIC=583.83
## Total.Cup.Points ~ Processing.Method + Category.One.Defects + 
##     Color + Category.Two.Defects + country_group
## 
##                        Df Deviance    AIC
## - Category.Two.Defects  1 0.084950 582.32
## - Category.One.Defects  1 0.084972 582.35
## <none>                    0.084623 583.83
## - country_group         2 0.087737 584.50
## - Color                 2 0.087987 584.87
## - Processing.Method     3 0.103381 605.96
## 
## Step:  AIC=582.33
## Total.Cup.Points ~ Processing.Method + Category.One.Defects + 
##     Color + country_group
## 
##                        Df Deviance    AIC
## - Category.One.Defects  1 0.085269 580.81
## <none>                    0.084950 582.33
## - Color                 2 0.088479 583.65
## - country_group         2 0.088528 583.73
## - Processing.Method     3 0.104146 605.27
## 
## Step:  AIC=580.82
## Total.Cup.Points ~ Processing.Method + Color + country_group
## 
##                     Df Deviance    AIC
## <none>                 0.085269 580.82
## - country_group      2 0.088567 581.81
## - Color              2 0.088949 582.39
## - Processing.Method  3 0.104260 603.59

“Best” identity link

gamma.best.identity <- glm(Total.Cup.Points ~ Processing.Method + Color + country_group, data = coffee.new, family = Gamma(link = "identity"))

Compare MSE and MAE for gamma and linear

<<<<<<< HEAD
linear.model <- lm(Total.Cup.Points ~ Number.of.Bags + Aroma + Flavor + Aftertaste + 
    Acidity + Body + Balance + Uniformity + Clean.Cup + Sweetness + 
    Cupper.Points, data = coffee.new)


coffee.new <- coffee.new %>% mutate(predict.inverse = gamma.best.inverse$fitted.values, 
                                    predict.identity = gamma.best.identity$fitted.values,
                                    predict.log = gamma.best.log$fitted.values,
                                    predict.linear = linear.model$fitted.values)


coffee.new %>% summarize(MSE.inverse = mean((Total.Cup.Points - predict.inverse)^2),
                         MSE.log = mean((Total.Cup.Points - predict.log)^2),
                         MSE.identity = mean((Total.Cup.Points - predict.identity)^2),
                         MSE.linear = mean((Total.Cup.Points -predict.linear)))
## # A tibble: 1 × 4
##   MSE.inverse MSE.log MSE.identity MSE.linear
##         <dbl>   <dbl>        <dbl>      <dbl>
## 1      0.0272    3.97         3.95  -4.37e-16
coffee.new %>% summarize(MAE.inverse = mean(abs(Total.Cup.Points - predict.inverse)),
=======
coffee.new.data <- coffee.new %>% mutate(predict.inverse = gamma.best.inverse$fitted.values, 
                                    predict.identity = gamma.best.identity$fitted.values,
                                    predict.log = gamma.best.log$fitted.values,
                                    predict.linear = best.linear.model$fitted.values)


coffee.new.data %>% summarize(MSE.inverse = mean((Total.Cup.Points - predict.inverse)^2),
                         MSE.log = mean((Total.Cup.Points - predict.log)^2),
                         MSE.identity = mean((Total.Cup.Points - predict.identity)^2),
                         MSE.linear = mean((Total.Cup.Points -predict.linear)^2))
## # A tibble: 1 × 4
##   MSE.inverse MSE.log MSE.identity MSE.linear
##         <dbl>   <dbl>        <dbl>      <dbl>
## 1        3.98    3.97         3.95       3.95
coffee.new.data %>% summarize(MAE.inverse = mean(abs(Total.Cup.Points - predict.inverse)),
>>>>>>> 46aa26ca3b1bc5d849e1bff1fdac9724cac36c34
                         MAE.log = mean(abs(Total.Cup.Points - predict.log)),
                         MAE.identity = mean(abs(Total.Cup.Points - predict.identity)),
                         MAE.linear = mean(abs(Total.Cup.Points-predict.linear)))
## # A tibble: 1 × 4
##   MAE.inverse MAE.log MAE.identity MAE.linear
##         <dbl>   <dbl>        <dbl>      <dbl>
<<<<<<< HEAD
## 1       0.101    1.26         1.26    0.00654
AIC(gamma.best.inverse)
## [1] -53.92572
======= ## 1 1.26 1.26 1.26 1.26
AIC(gamma.best.inverse)
## [1] 581.7692
>>>>>>> 46aa26ca3b1bc5d849e1bff1fdac9724cac36c34
AIC(gamma.best.log)
## [1] 581.3076
AIC(gamma.best.identity)
## [1] 580.8185
<<<<<<< HEAD
AIC(linear.model)
## [1] -857.6773

Linear model assumption check

plot(linear.model, which =c(1,2))

=======
AIC(best.linear.model)
## [1] 565.4986

Linear model assumption check

plot(best.linear.model, which =c(1,2))
<<<<<<< HEAD

=======

>>>>>>> a924709f8540846ff58f1bfba2f17655d1ceab03 >>>>>>> 46aa26ca3b1bc5d849e1bff1fdac9724cac36c34

Gamma model assumption check

plot(gamma.best.inverse, which = c(1,2))
<<<<<<< HEAD

======= <<<<<<< HEAD

>>>>>>> 46aa26ca3b1bc5d849e1bff1fdac9724cac36c34
plot(gamma.best.identity, which =c(1,2))

plot(gamma.best.log, which =c(1,2))

<<<<<<< HEAD ======= =======

plot(gamma.best.identity, which =c(1,2))

plot(gamma.best.log, which =c(1,2))

>>>>>>> a924709f8540846ff58f1bfba2f17655d1ceab03

For all of the gamma Q-Q and residuals vs. fitted plots, they are nearly identical to one another. For the Q-Q plots, they follow a very straight line. Because of this, they all have evidence for normality. However, the residuals vs. fitted plots are not randomly distributed across the horizontal axis at all. There is not enough evidence to claim linearity for the gamma models.

Among the gamma models, the gamma model using the identity log function appears to be the best by checking its AIC value. The AIC is 580.82, which is marginally lower than the two other gamma models, making it the best option.

Gamma Analysis

Country Regions Plot

ggplot(data = coffee.new, aes(x = country_group, y = Total.Cup.Points, fill = country_group)) + geom_boxplot() + labs(x = "Continent", y = "Total Coffee Points", title = "Total Coffee Points Based on Continent")
<<<<<<< HEAD

=======

>>>>>>> a924709f8540846ff58f1bfba2f17655d1ceab03 >>>>>>> 46aa26ca3b1bc5d849e1bff1fdac9724cac36c34